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Abstract 

^^ A transmitter Alice may wish to reliably transmit a message to a receiver Bob over a binary symmetric channel 

f^ (BSC), while simultaneously ensuring that her transmission is deniable from an eavesdropper Willie. That is, if 

^N Willie listening to Alice's transmissions over a "significantly noisier" BSC than the one to Bob, he should be unable 

P^ to estimate even whether Alice is transmitting. Even when Alice's (potential) communication scheme is publicly 

■^\ known to Willie (with no common randomness between Alice and Bob), we prove that over n channel uses Alice 

—J. can transmit a message of length 0{-\/n) bits to Bob, deniably from Willie. We also prove information-theoretically 

CN order-optimality of our results. 
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I. Introduction 

Alice is in jail, and may wish to communicate reliably with Bob in the neighboring cell, over n uses 

of a noisy BSC (if she stays silent, the input to the channel is all zeroes). Unfortunately, the warden 

K> Willie is monitoring Alice (though his observations are significantly noisier, since his CCTV camera is 

Q\ low-quality)Q Willie only wishes to detect Alice's "transmission status" {i.e., he only wants to know 

\^ whether she's talking or not, and doesn't necessarily care what she's saying). Hence Alice wishes to use 

■^ a communication scheme that is "deniable from Willie", i.e. Willie's best estimate of Alice's transmission 
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Status should be essentially statistically independent of his observationsjj 
In this work we demonstrate: 

1) Deniability - outer bound on codeword weight: If the binary code Alice uses to encode her message 
contains a substantial fraction of "high-weight codewords" (that is, have weight that is ix'(v^) over 
n channel uses), then her communication scheme cannot be deniable. In particular, Willie can 
simply count the number of non-zero symbols he observes, and compare this number with a simple 
function of channel parameters, to estimate, fairly accurately, Alice's transmission status. Hence, 
for deniability from Willie, Alice's code should comprise mostly of "low-weight codewords". 

2) Reliability/deniability - outer bound on throughput: In our model the communication link from 
Alice to Bob is also a BSC (albeit a somewhat less noisy channel than the one from Alice to 
Willie). We use information-theoretic inequalities to demonstrate that for any code that satisfies 

'if the channel from Ahce to Wilhe is at least as good as the channel from Ahce to Bob, then clearly no communication that is 
simultaneously reliable and deniable is possible, since Willie can use whatever decoding strategy Bob can use. 

^For a more "communication system" inspired motivation, consider a setting where a spy drone flying over enemy territory wishes to 
transmit its observations to its base, without the enemy even knowing whether or not someone is spying on it. If the drone uses a directional 
antenna, the effective signal to noise ratio observed by the enemy may be significantly lower than that observed by the drone's base. 



the outer bound on codeword weight required for deniability, if reliable decoding by Bob is also 
required, then a message of at most 0{^/n) bits can be transmitted by Alice over n channel usesH 
3) Reliability/deniability - achievable scheme: If Willie's BSC is "sufficiently noisier" (in a precise 
sense that we quantify later) than Bob's BSC, then we design a communication scheme (publicly 
known to all parties - Alice, Bob and Willie) such that: 
• Throughput: It encodes a message with 0{y/n) bits in n channel uses (or, when the channel to 



Bob is noiseless (though the channel to Willie is still noisy) 0{^/n) \og{n) bits). 

• Reliability: It enables Bob to correctly reconstruct Alice's message with high probability. 

• Deniability: Willie's best estimate of Alice's transmission status is essentially statistically inde- 
pendent of his observations of his channel. 

4) Deniability - lower bound on code parameters: Surprisingly, we are also able to show, for any 

deniable code, a lower bound on its number of codewords (as a function of the code's structural 

properties). For deterministic codes, this implies a lower bound on its throughputjjThis lower bound 

arises by the following observations. Suppose Alice only uses "somewhat high-weight" codewords 

(say her codebook consists entirely of codewords of weight Vtrf for some e > 0) - this is not a bad 

idea, since the second result above demonstrates that having too many low-weight codewords results 

in a code being unreliable. Then we demonstrate that if Willie uses an estimator that is an analogue 

of minimum distance decoding he can accurately estimate Alice's transmission status (even if he 

cannot reconstruct her message). 

The first two results above are analogues (for the scenario of the BSCs considered in this work) of 

theorems in recent work that motivated our work (in particular, the corresponding results for Additive 

White Gaussian Noise (AWGN) channels proved in [[B 13)- The last two corresponding to construction 

of "reliable and deniable public codes", and novel bounds on the structure of any such codes, are entirely 

new. 

In particular, we stress again that in our model (unlike the models in most prior work) everything 
that Bob knows a priori about Alice's communication scheme, Willie also knows - there is no common 
randomness that is hidden from Willie that Alice and Bob can leverage. The only asymmetry between 
Bob's and Willie's estimation abilities arises from the fact that Willie's observations of Alice's (possible) 
transmissions are noisier than Bob's. Hence the fact that we demonstrate the existence of public codes 
satisfying Result 3 above is a significant strengthening of the model in HI El, wherein in general common 
randomness is required, and consumed at a rate greater than the throughput of the reliable/deniable 
communication in the first place! 

Also, in our model (and also in the model of [HI |2l, but not in the vast majority of steganographic 
models), Alice's default transmission if she has nothing to say, is nothing. This default silence of Alice 
makes it challenging to hide the fact that she is not silent when she actually has something to say. The 
only reason we are able to achieve a non-zero throughput is due to the fact that Willie's observations of 
Alice's potential transmissions are noisy (and in particular, significantly noisier than Bob's). Hence the 

^Note that this imphes that Alice's rate decays to asymptotically in n. Hence in this work we usually scale Alice's "throughput" (the 
number of bits in her message) with respect to y^n, to obtain a quantity we call the "relative throughput". 

*0f course, Alice could choose to use a stochastic encoder, wherein the same message could be randomly assigned to one of many 
different codewords. In this scenario our lower bound would only apply to the number of codewords, not the entropy of Alice's message. 



subtitle of this work - "hiding messages in noise". 

Result 4 is also, to the best of our knowledge, entirely novel. Similar results do not hold in the setting 
with common randomness between Alice and Bob - in that model (for instance that of [[H |2l), high 
deniability does not impose a lower bound on the rate of communication. 

II. Related work 

The problem we consider is a variant of the classical steganography problem, but with important 
differences in the model that both make our results more "realistic" in some settings, and also technically 
more challenging to prove non-trivial results about. 

A. Steganography 

The problem of steganography (broadly defined as "hiding an undetectable message in plain sight") is 
rooted in antiquity - brief but colourful historical perspectives on a variety of steganographic models and 
methods (including various techniques used by Xerxes, Herodotus, Mary Queen of Scots, and Margaret 
Thatcher, and even one which involves killing dogs...) can be found in [O and H|. Information-theoretic 
models presenting "modern, formal characterizations" of steganography problems started appearing in the 
literature in the 1980's and 1990's - among many others the works of Simmons [|5J| (who formalized the 
"Prisoners' problem"), and Cachin [|61 and Maurer [|7| (who drew the connection between steganography 
and another classical problem - hypothesis testing) come to mind. More recent and fairly comprehensive 
compendiums of results on the theory and practice of steganography can be found in the books [[Sj] and [|91. 

However, the vast majority of steganographic models make at least one of the following assumptions 
{none of which we make): 

• (Al) Non-zero covertext/stegotext: In almost all works in the literature, Alice has access to a 
length-n sequence (the "covertext") drawn from some distribution (this distribution, but not the 
actual value of the covertext, is known to Bob and Willie). The assumption is that after observing 
the covertext, Alice is allowed to transmit some (slightly) perturbed value of the covertext, called 
the "stegotext", over the channel, and both Bob and Willie observe this stegotext (or some further 
"perturbed" version of it). The critical point here is that Alice's default transmission (even if she 
has no hidden message to transmit to Bob) is usually non-zero, by this assumption. One example 
could be if Alice is allowed to upload photographs onto her website - this activity looks innocuous 
enough, and Willie might find it challenging to estimate whether or not Alice is hiding some message 
to Bob in those photographs. It is this "haystack" of covertexts/stegotexts that many steganographic 
algorithms leverage, to hide a "needle" of a hidden message in. A plethora of works characterize the 
"capacity" of various steganographic problems - see for instance [[T0] - fT4ll . An important exception 
to the non-zero covertext assumption occurs in the work of Bash, Goeckel and Towsley lUl |2l| - we 
discuss this work in depth below. 

• (A2) Shared secret key/common randomness: Kerchoffs' principle of cryptography [[151 states, 
roughly, that "A cryptosystem should be secure even if everything about the system, except the key, 
is public knowledge". This is just as true for the problem of steganography. 



However, a significant number of steganographic protocols violate this precept by requiring a key 
(that is often almost as large as the message being communicated) to be shared between Alice and 
Bob, and is kept secret from Willie, in advance of any communication. A variety of examples of 
such protocols can be found in, for example [9] or [fTOll . 

Such a key certainly helps considerably - it allows Alice and Bob to coordinate which of potentially 
many codes to use to communicate, and Willie is left in the dark regarding this choice. The work 
that is closest in spirit to our work, that of /HJ |2]/, differs critically with ours in this assumption - 
their protocol requires Alice to consume VL{n) bits of a secret key shared with Bob, for her to be 
able to communicate to Bob a hidden message with 0{^/n) bits. 



However, not all works make this assumption. Some exceptions to this assumption of a shared secret 
include [[II1II10 

• (A3) Noiseless communication: Some works consider a model wherein the communication channel 
between Alice and Bob is noiseless. This has some important consequences - in some such scenarios, 
the optimal throughput can sometimes be boosted by a multiplicative factor of log n (for instance IH 
Chapters 8 and 13]). 

However, in a variety of other works the channel from Alice to Bob may have noise. In some models 
this may be due to an actively jamming warden (for instance [[T0ll )|jln other models this may simply 
be random channel noise (for instance the work of [jTl |2l, and our work here). 
The effect of noise on the channel from Alice to Willie is less well investigated - again, the work 
of Ol 121, and our work here, are the only ones that we are aware of. 

B. The Square Root Law 

The "Square Root Law" (often abbreviated as SRL in the literature) can be perhaps characterized as 
an observation that in a variety of steganographic models, the throughput (the length of the message 
that Alice can communicate deniably and reliably with Bob) scales as 0{^/n) (here n is the number of 
"channel uses" that Alice has access to). 

^The scheme in the latter work has the property that the stegotext could have a large "distortion" when compared with the covertext 
(even though its statistics are identical, and hence the scheme is perfectly deniable). Various other works (for instance 1101 ) also impose 
"maximum distortion" criteria in addition to reliability and deniability. In our model this is a moot point. For one, our covertext is all-zero. 
For another, our codes have the property that Alice's transmissions use "very low-weight" codewords, hence automatically satisfying most 
natural distortion criteria with respect to the all-zero codeword. 

^There is a significant amount of work dealing with communication in the presence of "oblivious adversaries" |16| (or equivalently. 
Arbitrarily Varying Channels (AVCs) 11714211 ). In this line of work it is shown, for instance, that even if an adversary can flip up to pn 
bits on the channel from Alice to Bob, as long as the jammer does not know what codeword was transmitted by Alice (he is "oblivious"), 
his most harmful jamming pattern is essentially no worse that random noise generated by a BSC(p). Note that if one succeeds (as we do, 
in this work) in constructing public codes highly deniable (that are also highly reliable against random noise), then it might be reasonable 
to model Willie as an oblivious adversary, and hope to prove that our codes are simultaneously highly deniable, and highly reliable against 
jamming noise inserted by an active warden. However, this line of thought fails for the following reason. Our first result demonstrates that 
for the code to be highly deniable, most of its codewords must be "very low weight". Since the code is public, Willie the active warden 
(even though he may well be oblivious to Alice's transmissions) can simply insert on the channel from Alice to Bob a low-weight jamming 
pattern that corresponds to a legitimate codeword that Alice could have chosen to transmit to Bob. Hence Willie succeeds in causing a 
significant probability of decoding error for Bob, since there is no way for Bob to distinguish between Alice's transmissions and Willie's 
jamming patterns. Such an attack (corresponding to the symmetrizability condition in AVC outer bounds) succeeds here since the deniability 
requirement imposes a constraint on Alice's codebook (low-weight codewords) that makes it easy for an active warden to attack. However, 
in the setting with private codes (in which Alice and Bob share a private key), this argument does not work. Hence the results of 1101 . 



"Steganographic capacity is a loosely-defined concept, indicating the size of payload which 
may securely be embedded in a cover object using a particular embedding method. What 
constitutes "secure" embedding is a matter for debate, but we will argue that capacity should 
grow only as the square root of the cover size under a wide range of definitions of security." - 



This observation seems to have some empirical support in the community of people implementing 
"real- world" steganographic protocols (for instance, see [23|). This "law" is heuristically justified via the 
following reasoning: 

"Thanks to the Central Limit Theorem, the more covertext we give the warden, the better he 
will be able to estimate its statistics, and so the smaller the rate at which [the steganographer] 
will be able to tweak bits safely." - [|24l 

"[T]he reference to the Central Limit Theorem... suggests that a square root relationship 
should be considered." -[[22| 

Some recent work (for instance [[T2l 1251 ) has begun to theoretically justify this law under some (fairly 
restrictive) assumptions on the class of steganographic protocols. Nonetheless, results in this class should 
still be taken with a pinch of salt, since they do not offer a universally robust characterization for all 
models which may be of interest. For instance, in some works (for instance [|9l Chapters 8 and 13])) 
the throughput scales as C(v^log (n)). More drastically, the works of [fTOl (which gives an information- 
theoretically optimal characterization of the rate-region of many variants of the steganography problem) 
and that of [ITTIl (which design computationally efficient steganography protocols) both allow throughput 
that scales hnearly in n, rather than 0{^/n) as would be indicated by the SRL. The major difference 
between the models of [fTOl [TTI . and those that satisfy the SRL, seems to lie in a disagreement as to what 
comprises "realistic" steganographic algorithms. 

We note that in our setting (and also that of [HI 13), our throughput does indeed provably scale as the 
square -root of the number of channel uses. However, the critical reason underlying this scaling is that we 
consider the scenario wherein the covertext is all-zero - Alice must "whisper very softly", since she has 
no excuse if Willie hears something that cannot be explained by the noise on the channel to him. 

C. The work of Bash, Goeckel and Towsley SB E]/ 

The results and techniques closest to those in this work (and indeed the starting-point of our investiga- 
tions) are those of [HI |2l- However, there are important differences in the models. 

• Public codes vs. shared secret keys: The critical difference between our model and that of [JH [2l 
(and the reason we state that our model is more "realistic") is that in our setting there is no shared 
secret key between Alice and Bob that is hidden from Willie. Hence our codes are "public". A setting 
wherein Alice's consumption of secret keys happens significantly faster (i7(n)) than her throughput 
(0{^y~{n))) to Bob (as in [[D [3) is not sustainable. The reason we are able to achieve such performance 
is due to a more intricate analysis of random binary codes than is carried out in [[U El - novel and 
intricate analysis of concentration inequalities was required. 



• Discrete vs. continuous channels: In our work all channels are discrete (finite input and output 
alphabets) - in particular, for ease of presentation of our results we focus on the case wherein 
Alice's transmissions pass through independent BSCs to get to Bob and Willie (though our results 
may be directly extended to larger classes of discrete channels). In contrast, the results of [lTll2l are 
for channels wherein the noise is AWGN. It is conceivable that our construction of public codes also 
carries over to the AWGN model of [H] |2|, but significant extensions would be required to translate 
our techniques from the discrete world over to the continuous version. 

III. Model 

A. Notational Conventions 

Calligraphic symbols such as C denote sets. Boldface upper-case symbols such as X denote random 
variables, boldface lower-case symbols such as x denote particular instantiations of those random variables. 
Vectors are denoted by an arrow above a symbol, such as in x. For notational convenience, in this work, 
unless otherwise specified, all vectors are of length n, where n corresponds to the block-length (number 
of channel uses). Probabilities of events are denoted with a subscript denoting the random variable(s) over 
which the probabilities are calculated. All logarithms in this work are binary, unless otherwise stated. The 
Hamming weight (number of non-zero entries) of a vector x is denoted by wt//(x), and the Hamming 
distance between two vectors x and y of equal length (the number of corresponding entries in which x 
and y differ) is denoted by d//(x, y). For any two numbers a and h in the interval [0, 1], we use a*h 
to denote binary convolution of these two numbers, defined as a(l — 6) + 6(1 — a) - this corresponds 
to the noise parameter of the BSC comprising of a BSC(a) followed by a BSC(6). As is standard in an 
information-theoretic context, the notation H {■) corresponds to the (binary) entropy function, i^(-|-) to 
conditional entropy, /(■;■) to mutual information, and I? (-H-) to the Kullback-Leibler divergence between 
two distributions. 

B. Communication System 

The transmitter Alice is connected via a binary-input binary-output broadcast medium to the receiver 
Bob and the warden Willie. The channel from Alice to Bob is a Binary Symmetric Channel with 
crossover probability p^ (henceforth denoted BSC(p(,)). The channel from Alice to Willie is a BSC(j9,i,). 
By assumption, the noise on the two channels is independent, p^ < p^, and all parties (Alice, Bob and 
Willie) know the channel parameters p^, and p^. 

Alice (potentially) wishes to communicate a message m uniformly at random from a set {1, . . . , A^} 
to Bob - the symbol M denotes the random variable corresponding to Alice's message. For notational 
convenience we say that if Alice does not wish to communicate with Bob, her message is 0. Equivalently, 
if Alice does have a message she wishes to communicate to Bob, then a certain arbitrary binary variable 
T equals 1. Otherwise, T equals 0. Only Alice knows the value of T a priori. 

Alice encodes each message m G {1, . . . , A^} into a length-n binary codeword x^ = {xm{'^)-, • • • , Xm{n)) 
using an encoder Enc{-) : {0} U {1, . . . , A^} — )■ {0, 1}". To simplify notation, we often denote x^ and 
Xm as X and X respectively, wherever it does not cause confusion. The encoder is required to satisfy the 
condition that the message is always encoded to the length-n zero-vector (denoted by the codeword xq)- 
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Fig. 1. System diagram: Depending on her transmission status T, Alice either broadcasts the all -zero dector 6, or encodes 
her messages M into code'words X = EncCM.) from her codebook C. The codebook has 2^ = 2''^ code'words, ■where 
the throughput r scales as 0{y/n). Bob receives Y^ over a BSC(ph), and Willie receives a noisier version Y^u over a 
BSC(p^), where pi, < p^. It is desired that the codebook C be both reliable {i.e. Bob is able to decode M as M with a "small" 
probability error) and deniable {i.e. Willie's observations give him essentially no information about Alice's transmission status.) 
We summarize the notation in this system diagram in tabular form in Figure [2] 



The set {xi, . . . ,XAr} of possible non-zero codewords that are the outputs of Alice's encoder is denoted 
by a specific codebook Cq. The throughput t of Alice's codebook is defined as logA^, and the relative 
throughput r of Alice's codebook is defined as log N/y/n. We note that in this work, the throughput of 
the codes we consider typicall>y scale with the block-length n as 0{^/n). This corresponds to a "rate" 
that decays to zero as n increases without bound, rather than converging to a constant as is common in 
many other communication settings. Hence we deliberately consider the relative throughput rather than 
the rate of our codes. 

Bob receives the length-n binary vector Y^. Here Y;, = X © Z^, where Zf, denotes the noise added by 
the BSC(pb) channel between Alice and Bob. Bob uses his decoder Dec{-) : {0, 1}" -)■ {0} U {1, . . . , A^} 
to generate his estimate of Alice's message as M = DeciYh)- Bob's probability of decoding error when 
Alice is transmitting, Pe,T=i, is defined as Pij^^ (^ = 0|T = 1) + PfM z (^ t^ ^I^ = !)• Bob's 
probability of decoding error when Alice is not transmitting, Pe^T=o, is defined as Pr^ (M ^ 0|T = 0). 
Bob's overall probability of decoding error, Pe, is defined as Pe,r=i + -Pe,r=ojjWe say Alice's codebook 
Co is (1 — t)-reliable if Bob's probability of decoding error is less than e. 

Willie knows a priori both Enc{-) and Dec{-) (and hence also Co). Willie receives the length-n binary 
vector Y^. Here Y^; = X © Z^;, where Z^„ denotes the noise added by the BSC(jo^) channel between 
Alice and Willie. Willie uses his estimator Estcf^{-) : {0, 1}" — )■ {0, 1} to generate his estimate of Alice's 
transmission status as T = EstcgiYyj). 

We use a hypothesis-testing metric to quantify the deniability of Alice's codebook C. Let the probability 
of false alarm Pr^ 
detection Pr 



M,Z„ 



= 1|T = 0) be denoted by a{Estc„{-))- Analogously, let the probability of missed 
0|T = 1) be denoted by (3{Estco{-)) (and for a specific transmitted codeword m. 



'But not always - in particular, if pb = or p„ = 1/2, different scaling laws apply. We discuss this in more detail in our main results. 

^To simplify matters we consider here only deterministic encoders and decoders, rather than the more general stocfiastic encoders and 
decoders, which are allowed to use private randomness not available to any other party. Examination of our techniques demonstrates that 
our results do not change substantially even if we allow stochastic encoding/decoding. 



Hamming distance (between two binary vectors) 

Hamming weight of a binary vector) 

Throughput of a code, defined as log(A^) 

Relative throughput of a code, defined as r j ^n = log(N)/y/n 

"Codebook generation" parameter, used to design a certain random ensemble of codes 

Ensemble of random codebooks chosen by generating 2"^ codewords 

with each bit 1 with probability p 
A specific codebook, perhaps, but not necessarily, drawn from the ensemble C 
Number of messages Alice wishes to transmit to Bob 
Number of channel uses (block-length) 
Willie's channel noise parameter 
Bob's channel noise parameter 
Alice's transmission status 
Message random variable 

Codeword (corresponding to message M), equals if Alice did not transmit 
Noise vector for Willie 
Noise vector for Bob 

Willie's received (noisy) codeword, equals X © Z^; 
Bob's received (noisy) codeword, equals X © Z;, 
Willie's estimate of Alice's transmission status 

Bob's estimate of Alice's transmitted message (equals if Alice did not transmit) 
Willie's probability of false alarm, equals Pig (T = 1|T = 0) 
Willie's probability of missed detection of codeword x, equals Prjyj g (T = 0|T = 1,X 

Willie's probability of missed detection, equals J2s.ec P^m(X)/3*^^^ 
The deniability of codebook Cq 
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Fig. 2. Notation used throughout the paper 



we denote Pr^jT = 0|M = m) by /^('"^(Estco (■)))• The quantities a{Estco{-)) and /3{Estco{-)) denote 
respectively the probability that Willie guesses Alice is transmitting even if she is not, and the probability 
that Willie guesses Alice is not transmitting even though she actually is. We say Alice's codebook Cq is 
(1 — t)-deniable if there is no estimator Estcoi^-) for Willie such that a{Estc(,{-)) + P{Estco{-)) < ^ ^ ^fl 
Where we can do so without confusion, we denote a{Estco{-)) and /3{EstcQ{-)) simply by a and /3. 

For any block- length n, we say a corresponding codebook Cq is simultaneously (1 — t)-reliable and 
(1 — e)-deniable if it simultaneously ensures that Bob's probability of decoding error is at most e, and 
has deniability 1 — e. 

We summarize the major notation (graphically represented in Figure [T]) in the table in Figure |2} 



IV. Main results/high-level intuition 
We now present our main results, and the intuition behind the corresponding proof techniques. 
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Fig. 3. Willie's observation if Alice does not transmit: The curve enclosing a partially gray-scale shaded region in the 
upper of the two figures above is a pictorial representation of the set (of size 2") of all possible y^ that Willie may observe 
if Alice transmits nothing. In this scenario, Alice's transmission status T equals 0, hence her "transmitted codeword" equals 
the zero vector shown by the black dot at the left). In particular, the y^, are arranged in a partial order, so that vectors with 
lower Hamming weight are to the left of vectors with higher Hamming weight, and the height of the enclosing curve denotes 
the number of binary vectors of a particular Hamming weight, hence the shape of the enclosing curve is exactly the binary 
entropy function. Given this, the shaded region denotes the set of "likely" y^u that Willie observes, with the darkness of the 
shading denoting the likelihood. Note that since Alice transmits the all-zero vector, hence y^ is the output of a BSC(ptu), hence 
strings with lower Hamming weight are likelier than strings with higher weight. The probability of observing any particular 
y is in general exponentially small in rt - in particular, the probability of observing any particular "typical" y (with roughly 
PyjTi ones) is approximately 2""^'^™'. Our proof techniques require us to work with such a probability distribution, over 
an exponentially large alphabet distributed in {0, 1}", and exponentially small values. However, since this can be hard to 
visualize, we also plot the lower of the two curves. This curve at the bottom plots the probability distribution of observing 
y^ of a particular Hamming weight - this can be thought of as the "projection" of the probability distribution Prg (y^) 

along the "all-ones" vector 1. That is, for each i e {0, . . . ,n}, the a;-axis of the curve at the bottom denotes the probability 
distribution Prg {{y-w ■ Yw-i = *})■ Since AUce's transmitted codeword is 0, the "typical" y„, that Willie observes are of 
weight approximately pu^n (with a variation of 0{y/n), as can be shown by standard concentration inequalities such as the 
Chernoff bound). Note that the curve at the bottom is "smooth" - it follows a binomial distribution. In fact, the "projection in 
any direction v" (defined as Pr^ {{y-w ■ Vw^ — «})) of 'he probability distribution follows a binomial distribution. Standard 
techniques also allow us to estimate that the maximum value of this probability distribution scales as 0{l/sqTtn). 
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Fig. 4. Willie's observations if Alice transmits: The curve enclosing a partially red-shaded region in the upper of the two 
figures above is a pictorial representation of the set (of size 2") of all possible y^ that Willie may observe if Alice transmits 
a codeword uniformly at random from her codebook C (her transmission status T = 1). The codebook C comprises of the 
codewords shown as the black dots on the left - given that Alice's transmission status T is 1, each codeword is chosen 
with probability 2^^^. Given that Alice transmits a particular x, the set of y^, that Willie is likely to observe is pictorially 
represented by the red paraboloid region (denoted thus merely as a visual aid) extending rightwards from that x. The overall 
probability distribution over Willies observed y„, is hence the "average" of the paraboloid regions. Again, for general y^ 
this probability distribution takes values that are exponentially small in n. Since, by design, codewords in our codebook C 
have expected Hamming weight 0{^n), the "typical" y^, that Willie observes are of weight approximately (p^ * p)n, with a 
range of about 0{,Jpn). Since we choose p = Q{l/^/n), this translates to codewords of weight about 9(-\/n), with a range 
of about Q{^/n). The curve at the bottom plots the probability distribution of observing y^ of a particular weight (again, 
similar behaviour is observed along other projections). In this case (unlike in Figure [5]) the probability distribution on y^ is 
somewhat "lumpy", since the probability that Willie observes a particular y^ depends on the distribution of the Hamming 
distance between that particular y^ and the set of codewords x G C. In particular, the probability distribution on y^u equals 
Prjyj g (yu,|T = 1) = 2^'" X^xsc ^^z (ytul^)- '^o '^he weight distribution of y^, is a weighted sum of binomial distributions. 
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Pr (wtnCyw)) 

M,Zw 



o(i/V^) 




WChCYw) 
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Fig. 5. A threshold estimator for Willie: If "too many" codewords in the codebook C have "large" Hamming weight, the 
two probabiUty distributions on y„, corresponding to T = and T = 1, respectively Prg (yu,|T = 0) and Pr^^ g (yu)|T = 
1), are "very different". In this case, Willie can detect Alice's transmission status fairly accurately by using the following 
"thereshold" estimator In particular, Willie's estimator outputs T = 1 if the weight of the observed codeword y„ is above a 
carefully chosen threshold, and T = otherwise. This is a visual depiction of the ideas behind Theorem [ij 



A. Deniability - outer bound on codeword weight 

Theorem [T] below proves an outer bound on the Hamming weights of codewords of any codebook C 
that has high deniability. The intuition, as pictured in Figure [5| is that if many codewords have "high" 
Hamming weight, with non-negligible probability the Hamming weight of Willie's observed vector y^ will 
be above a carefully chosen threshold. This theorem and the corresponding proof techniques are analogues 
(for the BSCs considered in this paper) of Theorem 2 in |!2l (derived there for AWGN channels). 

Theorem 1. For any p^ < 1/2, if more than a '-/fraction of the codewords in C are of weight greater 
than C\^fn, then the deniability of Alice's codebook C is less than 1 — 7 + J?j_2~^)2 • 

Note 1.1: If pw = (noiseless channel to Willie) then one would expect that the deniability of any code 
to be low. One can see this by setting ci = l/y/n (corresponding to codewords of Hamming weight 1) 
and 7 = 1 (since each non-trivial codeword must have Hamming weight at least 1), resulting in an outer 
bound of 0-deniability. 

Note 1.2: On the other hand, if Pw = 1/2 (perfectly noisy channel to Willie) then one would expect the 
deniability of any code to be high. As one would expect, our bound reflects this, since the bound is infinite 
for any choice of ci and 7. This includes, for instance, setting ci = y/n (corresponding to codewords of 
Hamming weight n) and 7 = (since each length-n codeword must have Hamming weight at most n). 



B. Reliability/deniability - outer bound on throughput 

Theorem [2] below provides an upper bound on the throughput r of any code that simultaneously has high 
reliability and deniability. The proof technique follows standard information-theoretic converse arguments, 

'Again, we restrict our attention to deterministic estimators - an averaging argument directly demonstrates that given any stochastic 
estimator, there exists a deterministic estimator with at least equivalent performance for Wilhe. 
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r < < 



though at one point we critically need to use the fact (proved in Theorem [T] above) that codes that have 
high deniability do not have too many codewords of high Hamming weight. Again, this theorem and its 
proof are analogues (for BSCs) of Theorem 2 in ||2| (derived there for AWGN channels). 

Theorem 2. For any sufficiently small e, if a codebook C is simultaneously (1 — e) -deniable and (1 — e)- 
reliable, the relative throughput r is at most 

ifPw<Pb, 

VPM-Pn.) (iffe) (1 - 2e)-=^/2 iQg (^i^j ifp^ > p^ ^„j p^^p^ ^ (0, 1/2), 

(V (1/2^-^x7^1)0 i^sH + ^(1) ^fP^ = 0' 

^{l-H{p^) + e)^ if p^ = 1/2. 

Note 2.1: If either pb or p^ is in the range [1/2, 1], the corresponding receiver (Bob or Willie) can simply 
flip the received codeword, and treat it as the output of a channel with parameter in the range [0, 1/2], 
hence we focus our attention on the case when both ph and pw are in the range [0, 1/2]. The case of 
primary interest in this work (the second case in the theorem statement) is when both pb and p^, are in the 
range (0, 1/2) and Pw > Pb- in this "realistic" scenario the relative throughput tends to a constant {i.e. the 
throughput scales as 0{^/n)). The other cases deal with scenarios when the channel to Willie is at least 
as good as the channel to Bob (the first case in the theorem statement) - in this case no throughput is 
possible since Willie can simply employ the same decoder Bob would have, or when the channel to Willie 
is "considerably worse" than the channel to Bob (the third and fourth cases in the theorem statement) - 
in this case the throughput scales faster than 0{^/n). In particular, in the third case the throughput scales 
as 0{^/n\og{n)) corresponding to a relative throughput scaling as C(log(n)) as given in the theorem 
statement, and in the last case the throughput scales as 0{n) corresponding to a relative throughput scaling 
as 0{-Jn) as given in the theorem statement. 

Note 2.2: Positive relative throughputs are attained in [[ll |2l even when the channel to Willie is less noisy 
than the channel to Bob. The reason this is possible in their setting is because the shared secret between 
Alice and Bob allows for randomization from Willie. 

C. Reliability/deniability - achievable scheme 

Next we state and prove one of the main results of this work - namely, that randomly chosen codes 
(chosen from a suitable ensemble) are with high probability simultaneously highly reliable and highly 
deniable. This type of code is a significant improvement over the corresponding code constructions in [|Tl|2l. 
This is because the code constructions there require Alice and Bob to have access to common randomness 
(that is secret from Willie), whereas our codes are "public" (Willie knows exactly as much about the 
codebook as Bob does). In fact, the amount of common randomness required for the constructions in [[Hill 
scales as VL{y/n\og{n)) - note that the throughput of reliable/deniable communication for the problem 
scales at most as 0{^/n). This means that Alice burns through common randomness shared with Bob at 
a much faster pace than the throughput of the messages passed from her to Bob. 
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(p„* p)n - Oi^Jn) (p„* p)n (Pw* P)n + O(VJi) 



iv%(yw) 



P"^C.M,z.fe^a.|T=l) 




Fig. 6. Deniability from Willie: Our proof that a random codebook C chosen with the "right" parameters (number of 
expected weight of codewords) proceeds as follows. We need to demonstrate that the probability distributions 



0) and Pfiyr z (y^lT — 1) ^^ "close" (in variational distance). However, since the latter distribution is 



codewords 

complex (due to its dependence on the specific codebook C), we do this comparison in two stages. We first compute the ensemble 
distribution of y^,, i.e., the "smooth blue" region/curve denoting the "ensemble average" (over all suitably chosen random 
codebooks) of the probability distribution on y„,. We then demonstrate that the probability distribution Prg (yu)|T = 0) 
and the ensemble distribution Pr^ ^^ g (yi«|T = 1) {i.e. the weighted average over all possible codebooks C of the latter 
distribution) are "close". Finally, we prove that with high probability over the choice of codebooks C, the distribution of 
Pijyj g (y«i|T — 1) is tightly concentrated around its expectation Pr^ ^^ g (yt«|T — 1). This is a visual depiction of the 
proof technique underlying the proof of deniability in Theorem |3] 
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Fig. 7. Bob's decoder: Since the noise parameter pi, on the channel to Bob is smaller than the corresponding parameter p^, 
on the channel towards Willie, as long as the throughput is not too high, it can be shown that the sets of y^, that are "typical" 
with respect to the corresponding transmitted codewords x have relatively small intersection. Hence Bob can reliably decode 
by performing minimum distance decoding. 



The code constructions in HI 121 use this common randomness between Alice and Bob in the following 
way. First, they use the common randomness to coordinate which of an ensemble of possible codebooks 
Alice actually uses to communicate with Bob. Since Bob knows the common randomness, he knows 
which codebook Alice actually used. However, since the common randomness is kept secret from Willie, 
from his perspective, if Alice transmits a non-zero codeword, the probability distribution on his received 
y^ is that of a code ensemble average Pr^j^g (yii)|T = 1). Using some elegant statistical properties 
of this ensemble average distribution, and the distribution Pig (yto|T = 0) (corresponding to Willie's 
observations if Alice does not transmit anything), [[Tl |2l demonstrate that Willie is essentially unable 
to learn anything about the binary random variable T, since the two distributions "look very similar" 
from Willie's observations of y^. Their proof can be essentially summarized in the statement that "the 
ensemble average codebook is highly deniable". The challenge in extending their proof technique to a 
public codebook is that this proof says nothing about the existence of a single, public, highly deniable 
codebook. 

Our key idea is to extend the analysis by proving that the actual distribution Pij^ ^ (y«,|T = 1) on y«, 
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if Alice transmits a non-zero codeword is tightly concentrated about its ensemble average (this is visually 
depicted in Figure [6]). However, our first "naive" attempts in using standard concentration inequalities 
were unsuccessful, since for any particular y^„ the probability (averaged over Alice's choice of message, 
channel noise, and over all codebooks) that Willie actually observes y^ is exceedingly small (decaying at 
least exponentially in n). This means that standard concentration inequalities such as the Chemoff bound, 
in which the probability of concentration depends on the expected value of the random variables under 
consideration {i.e., the probability M and Z^ of observing a particular y^, viewed as a random variable 
over C) fail to give the required probability of concentration, since the expected value is too small. 

Hence we proceed indirectly. We first note that it suffices to prove that Prj^^g (yw,|T = 1) converges 
point-wise to its ensemble average for "typical" y^; (since the bulk of the probability mass of the ensemble 
average distribution falls in a certain range). For any y^ in this range, we prove that expected number 
of codewords at a certain distance range (corresponding to the "typical" noise patterns Z^„) of each y^ 
is super-polynomial. For random variables with such "large" (super-polynomial) expectations, standard 
arguments suffice to prove concentration with probability that is super-exponentially small in n. This 
allows us to show that with high probability over the ensemble average, a randomly chosen codebook 
satisfies the property that the number of codewords in "typical" Hamming shells around most "typical" y^i 
are tightly concentrated around their expectations. Book-keeping calculations then enable us to show that 
this concentration in the distance-distribution of codewords translate to a pointwise concentration (with 
super-exponential probability) of Pr^^ g (yu,|T = 1) to its ensemble average. This technique allows us 
to bypass the problem of the small expected valueq^ of the random variables of primary interest (the 
probability of observing specific channel outputs if a specific codebook is used), by focusing instead 
on random variables with "large expected values'j^ (numbers of codewords of certain "types") that then 
enable us to recover the random variables of primary interest. One calculation that requires some care 
is that due to the low throughput of our codes (scaling as 0{^/n)) we need to define our typical sets 
carefully, to simultaneously ensure that they are high probability sets, but are also not "too large". 

To complete the proof, we need to demonstrate that in fact a randomly chosen code (from the same 
ensemble as used to generate the highly deniable code above) is also highly reliable with sufficiently high 
probability (and hence a randomly chosen code is, with high probability, simultaneously high deniable and 
highly reliable). This follows from somewhat standard random coding arguments, if Bob decodes to the 
nearest codeword. Since the noise in the channel from Alice to Bob is smaller than the noise in the channel 
from Alice to Willie, the sets of yt that are "typical" (with respect to channel noise Z^) with different 
transmitted codewords, have small intersection with each other. This is visually depicted in Figure |7] 
Hence Bob is able to decode even though Willie cannot even tell whether Alice is transmitting or not. 
One calculation that requires some care is due to the low expected weight of codewords (by construction 
chosen to be about 0{^/n)), and hence the notion of "typicality decoding" has to be somewhat delicately 
defined. 

We generate the ensemble of all codebooks C containing 2*"^ codewords of block-length n, with each 
codeword generated by choosing each bit to be 1 with probability Bernoulli(p), and sample codebooks 

'"Or, as George Walker Bush put it, "the soft bigotry of low expectations". 
"Or, as Philip Pirrip might put it, we have "Great Expectations". 
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from this distribution (we call such codebooks random public codebooks). In what follows, we set p to 
equal C2I ^fn, where C2 is a code design parameter. 
Let 
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Theorem 3. For any p^ > max{|, | — ^-^^} , and any relative throughput r E {cg, cio) with probability 
greater than 1 — 2^^^^^\ a random public codebook C is simultaneously (1 — e)-reliable and (1 — e)- 
deniable. 

Note 3.1: One may notice that since we have not optimized all constants in our proof, we require not 
only that pyj > pb, but in fact that p^ is rather close to 1/2 (though it still works for values bounded away 
from 1/2). It would be interesting to examine whether in fact any p^ > Pw would also work. 



D. Deniability - lower bound on code parameters 

Finally, we are able to use our proof techniques to prove a novel lower bound on the throughput of 
any highly deniable code. Intuitively, our argument follows from noting that the set of the y^ that are 
"noise-typical" with respect to the all-zero codeword (corresponding to Alice not transmitting anything) 
comprises of y^ sequences of weight approximately p^n. But any transmitted non-zero codeword x can 
only be "noise-typical" with respect to a subset of these y^ sequences of weight approximately p^n. But 
for Pig (yw|T = 0) and Prj^ ^ (y«)|T = 1) to be "close" (so that the code is highly deniable), the 
set of y^ that are "noise-typical" with respect to some transmitted codeword should "cover" most of the 
sequences of weight approximately p^n. We then use a counting argument to show an inequality that 
bounds from below a linear function of the weight distribution of any codebook that is highly deniable. 

For each i E {1, • • • ,n}, let Ni(C) denote the number codewords of Hamming weight i in Alice's 
codebook C - by definition X]m=i ^j(^) = ^- Then we can demonstrate the following lower bound on 
the set of Ni{C). 

Theorem 4. Any code C that is (1 — e) -deniable satisfies the inequality 



l-e< 



(Elo N^{C))j 



n 
j=0 



(l-Pn,{l + 5(i))-Pn,fi 



+ e~3 



\S(i)'^Pwi 



N.,{C). 



This means, for instance, that if one chooses wishes to choose a highly deniable codebook such that 
all codewords are of weight 6{-Jn), then in fact there is a lower bound on the number of codewords that 
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scales as 2^^^^\ 

Note 4.1: The bound in this theorem arises from the fact that codewords x of different weights are able to 
cover different numbers of sequences of weight approximately PwU. In fact, low-weight codewords cover 
more sequences than do high-weight codewords, hence to increase the deniability of a codebook C, low- 
weight codewords (of weight o{^/n)) are substantially more valuable than high- weight codewords. One 
must be careful in introducing too many low-weight codewords, however, since a careful examination of 
the proof techniques in Theorem |3] shows that too many low-weight codewords may reduce the reliability 
of the codebook. A better understanding of the inherent tradeoffs between reliability and deniability, as 
mediated by mixtures of codewords of different weights, is a subject of ongoing investigation. 

V. Proof OF Theorem [1] 

Willie denotes the fraction of I's in his observed codeword Y^ by the random variable S. 

Note that if Alice does not transmit anything, then Y^ is purely the result of noise on the channel from 
Alice to Wilhe. In this case the expected value and variance of S (over the randomness in the channel 
noise) are respectively 

E[S|T = 0] = p^, and Var[S\T = 0] = -p^(l - p^). 

On the other hand, suppose Alice transmits a codeword X^ of weight pQ-n. In this case the expected 
value and variance of S (over the randomness in the channel noise) are respectively 

E[S|XJ = p^ * po, and l^ar[S|X^] = -{p^* po)(l -Pw* Po)- 

(Recall that p^ * po is defined as p^(l — po) + (1 ^ Pw)po-) 

Willie chooses a threshold t, an estimation-design parameter to be chosen later. He then sets his estimator 
Estc(Yw) to equal if S < p^ + 1, and 1 otherwise. 

By Chebyshev's inequality, we have 

a{Estc{Y^)) = PrgJS >p^ + t)< PrgJ|S -pj > t) < ^-(^j/-) . (i) 

Similarly, for codewords of weight npo we have 



= FT^Jp'HS<p^ + t) 

< Pt2Jp''\\S -pw- Po + 2pt„po| > Po- 2p^„po - t) 

= PrgJ^^HlS - Pyj * Pol) > Po - 2p»po - t) 

^ Pw + po- 2p^po - jpw + po- 2p^poY 2^ 

~ n(po - 2py,po - ty 

Here, /3*^^°^(£^stc(Y^)) is the missed detection probability with codewords of weight npQ. Notice that 
for a fixed t, the RHS of equation ([T|) is fixed and the RHS of equation ([2]) increases as po decreases. 
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Note that the deniability of Alice's codebook is, 

a + /3 = a + ^/3(^)p(x) 

X 

x:wtjf (x)>ci Y^ xiwt^f (x)<ciY^ 

= 7(« + ^^'^)p>c,/v^ + (1 - 7)(« + (i^'^)p<c,/VE 
For notational convenience, we define 

A = p^(l-p^) 

B = Pw + PO - '^PwPo - {Pw + PO - '^PwPoY 

C ^ Po(l-2p^) 

Substituting these values of A, B and C into ([T]) and ([2]) gives us that the deniability is at most 

tt + /3(p) < 1 \At-^ + BiC - t)-^] . 
n '- -' 

We thus define our outer bound on deniability as a function of t as 

f{t)^-[At-' + B{C-t)-^] (3) 

To optimize the bound we set -^^ = 0, giving 

^[-2At-^ + 2B{C-t)-^] =0 
^ -2At-'^ + 2B{C -t)-'^ =0 
^ 25(C-t)-3 =2At-^ 

<^ i = -, 



(f)' + i 
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Denoting this value of the threshold as t 



c 



fit) 



1 
n 



1 
n 



(f)'+i 






and substituting into (3) gives 



+ 



B 



C2 1 



'4)^+1 






nC2 






bV^ 



A + 



5 



baI 



(f) 



y43 +53 



Note that for po = i^(l/v^)' ^^ "^ grows without bound, nC'^ is also unbounded, but (^)^ + 1 and 
As + 55 both converge to constants. This means that /(t) — )■ as n — )• oo. Thus, for a highly deniable 
code, Po must be (9 f ^ j . 

More precisely, we set po = c\n~^ , where b E (0,1). We then have B converges to A as n grows 
without bound, and 

{oo if (5 G (0, 1) , 

cl{l-2p^Y if 5=1, 
if'5e(i,i). 

Hence, an outer bound on the deniability of any code is given by 



Urn fit) 





8pm(l-Pm) 

cf{l-2p„)2 

OO 



if 5 e (0, 1) 
if<5=i, 

if5G(i,l) 



(4) 



So, the contribution due to the weight less than Cl^/n terms is at most 1. Since at least a 7 fraction of 
the codewords are of weight at least ciy/n, we have 



a + (3 < 1 — 7 + 7 



Bp^(l-j9^ 



c?(l-2pJ2- 

VL Proof OF Theorem [2] 

A. When Pw < pb 

(This includes the boundary cases when p^ = 0, or when pb = 1/2.) In this scenario, Willie employs as 
his estimator the same function that Bob uses to decode. If Alice and Bob attempt to communicate with 
(1 — e) -reliability at any positive throughput, then Willie must also be able to decode (not just estimate 
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Alice's transmission status) with reliability at least 1 — e. Hence the deniability is at most e, leading to a 
contradiction. 

B. When p^ > pb and pb,Pw £ (0, 1/2) 

In this, the primary scenario under consideration in this work, Willie proceeds as follows. 

Setting 7 = 1/2 in Theorem Ip for a code to be (1 — e) -deniable, at most 1/2 the codewords of C axe 



of weight more than p^n, where (by the result of Theorem [T]), 



We denote this subset of codewords by C. We now bound the relative throughput of the code via parameters 
of this sub-code. 

Bob's probability of decoding error when Alice is transmitting, Pe,T=i, (and thus also his overall 
probability of error Pg) is at least 1/2 of PeT=i- Here Per=i denotes Bob's probability of decoding error 
when Alice transmits a message using codebook C (instead of C). Recall that the relative throughput of 
a code was defined as the binary logarithm of the number of codewords, divided by y/n. Since C has 
at most twice the number of codewords that C does, if r and r' denote the relative throughputs of the 
code C and the sub-code C, then r' > r — 1/y/n. We use the apostrophe ' to denote random variables 
corresponding to the sub-code C. Then, 



r'v^ = H(M') 

= H (m'\M'] + I (m';M'^ 

< 1 + r'v^P;^^! + / (^X; Y^ 

n 

< l + r'v^P;^=i + 5^(/J(Y,)-i/(Y,|X,)) 



i=l 



= 1 + r V^Pe,T=l 

l-Pb* Pe 



+n 



"^ (PbWPb * Pe) + Pe(l - 2pfe) log 



Pb* Pe 

Here, the first inequality holds due to Fano's inequality and the Data Processing inequality, the second 
inequality holds since the channel is memoryless, and the third inequality since C only has codewords 
of weight at most p^n and hence H (p^ * Pe) ^ H (Yj). Rearranging the terms of the above inquality and 

'^The choice of 7 = 1/2 is somewhat arbitrary, but for sufficiently small e, the choice of 7 change the parameters of the bound by too 
much. 
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replacing -Pg t=i by its upper bound of 2e, we get 

1 + nlv {ptWpb * Pe) + P.(l - 2p6) log (i^ 



r < 



< 






(i-p;^=i)v^ 

1 + n (P {p,\\p, * p.) + p,(l - 2p,) log (i^s^ 



:i-2e)v/^ 



Now, by the Claims in the Appendix, we have 



^ {Pb\\Pb * Pe) < pI 



1 



+ 



1 



Vb l-Pb 



Further, log fi=a^i£i) < log (^^^\. Therefore, we have 

' ^ \ Pb*Pe J — ^ \ Pb J ' 



i+-(p?G'^+T^)+^^(^-2^^)i°^('^)) 



r < , ^ ^ 

(l-2e)v^ 

Finally, we note that p^ = ci/y/n. Therefore, in the limit, as n grows without bound, we get 

Ci(l-2p,)log(i^^ 



r < 



l-2e 



2 { l-Pt 

Pb 

2~ 



Ap^il - P^){1 - 2p,)Hog 
\ (l/2-e)(l-2p,J2(i_2e) 

y/pu,il -Pw) 



f:^ ) (1 - 2er'/Mog ( i^ 

l-2p^J \ Pb 



C. When Pb = 

In this setting, reliability is guaranteed by default. Hence Alice only needs to guarantee deniability. A 
necessary condition for this is for at least half the codewords in Alice's codebook, the fraction of Is in 
Alice's codewords should be no more than p^, whose value is given in ([5]). But the total number of such 
"low-weight" codewords is at most J2i=i (") • ^^^ '^^Y bound this summation from above by (p^n) ( "^) , 
and hence the total number of codewords is at most 2{pji) ( "^) . Substituting the value of p^ from (p I and 
using Stirling's approximation gives one an outer bound on the number of possible codewords in Alice's 
codebook. Since for highly reliable communication over noiseless channels the number of codewords is 
essentially the same as the number of messages (stochastic encoding does change the throughput "by 
much"), this gives us the required outer bound on the throughput. 



D. When p^ = 1/2 

If p^ = 1/2, any transmission of Alice's is perfectly deniable. Hence she just has to guarantee (1 — e)- 
reliability. Hence the corresponding outer bound corresponds to the outer bound for a BSC(p5), with error 
probability at most e. 
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A. Table of Internal Parameters 




VII. Proof of Theorem[3] 



Table of Internal Parameters 



fraction of the variance of wt/^(x) 
fraction of the variance of wti:/(y) 
fraction of the variance of dii^(x, y) 



parameter in reliability 
codeword constant 



_v/2e 



Chemoff bound constant term in the exponent for reliability 2(| — PbY 

A value in Chemoff bound for y and dii^(x, y) when achieve €2 

Ax value in Chemoff bound for x when achieve €3 

Maximum constant when Pr(X|y) is smallest 

Intemal constant in deniability proof 

Chemoff bound for y and d^ (x, y ) outside the variance 

Chemoff bound for x outside the variance 

relative variation of the type class size 



In the two following subsections we argue respectively that (with overwhelming probability (super 
exponentially close to 1 as n increases without bound) over the randomness in the choice of the codebook 
C) a randomly chosen code is highly reliable, and also highly deniable. This then implies the existence 
of a single (publicly known) codebook that is simultaneously highly reliable and highly deniable. 



B. Reliability 

Recall the codebook C was generated by choosing 2"^ -Jn codewords, with each bit of each codeword 
generated i.i.d. according to Bemoulli(p). Bob uses a minimum-distance decoder. 

If Alice did not transmit (T = 0) we define Pi as the probability (over Zf,) of the event {M 7^ 0|T = 
0}. If Alice did transmit, then the error event corresponds to {M ^ M|T = 1}. This event includes two 
distinct scenarios - one corresponding to M 7^ (Bob decoding to the wrong non-zero message), and 
the other corresponding to M = (Bob estimating that Alice did not transmit anything, even though 
Alice did transmit). We use Pi to denote the probability (over M, Z^) of the first of these two scenarios 
{M 7^ 0, M 7^ M|T = 1}. We then use the observation that due to the properties of the zero codeword 0, 
and of minimum-distance decoders, the probability (over M, Z5) of Bob estimating M = even though 
T = 1 equals the probability (over Z;,) of Bob estimating M 7^ even though T = 0. Given this, we 
simplify our overall probability of error Pg as the sum 2Pe + Pe (the factor of 2 arises from our 
observation above). 

The error event E is defined as the union of the error events. By slight abuse of notation, we use Pe(C) 
etc to denote the probabilities corresponding to specific codes C. 
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Note that 



PW = 5^Pr(C)pW(C) 
c 

= Y^ Pr(C) J2 P^(M = m)FT{£\M = m,C = C) 

C m=l 

C m = l 



m=l C 

= ^Pr(C:)Pr(^|M = m,C = C) 

c 
= Prc(^|M = m) 

So, without loss of generality we assume that the first message is sent. That is, Pi = Prc(£^|M = 1). 
As is common in Shannon-theoretic achievability proofs, all subsequent calculations of probabilities in 
this subsection are also calculated over the randomness of the choice of specific codebook C, besides the 
randomness in the channel noise Z5. 

Bob's probability of decoding error can now be calculated in two steps. First, we focus on Pe . 

P«= J2 PrzJd^(x(l),y)>dH(x(m),y)]. 

m:my^l 

First, denote Em = {dj/(x(l),y) > dii^(x(m),y)}, where m 7^ 1 corresponds to the error event given 
that the message 1 is chosen. Then we have that 



P« = Yl Pr(x(l)=x 



xe{o,i}" 



P'-z. U 4:' 



m=2 



xe{o,i 

2rs/n 


Pr (X(l) = x) 


2rV" 

> ; p^z. (4;') 

m=2 




>: 

m=2 


> ; Pr (X(l) = 
xe{o,i}" 


= x) (Pr,, (Ei\y 


))' 



We use Tm to denote supp{'x.{l))—supp{'K{'M)). Here suppQ denotes the support of a. binary codeword. 



and — denotes the symmetric dijference of the corresponding sets. This means 



^(1) 



is a random variable 



over the randomness in M. For notational convenience, we use E 



■x(l)( 



Exeio,mPMX(i) = x 
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Then, 



E^(i) [Pr (i?,«)] = E^(i) [Prg^ (d^(x(l),y) > d^(x(m),y)) 



Ex(i) 



E^(i) 






< E^(i) 



Pr^ (ratio of I's of Z,, in T^^ > - 



(■ 



Pr^. (ratio of I's of Z5 in T^ > - 



E 



7 7>i|T(''l 



rr(l) 
J- m 



Pb\l-Pb] 






-2ri-p,)'|TW| 



with the last inequality following from the Chemoff bound. 



Recall that we set p = -%. Also note that 



rr(i) 

J- m 



Then, we have jj,m , defined as E ( 

Hm are equal, we use /i'^^) to denote /im • So, Pr (|T„j| < (1 — 5)/^'-"'^-') < e 
Hence, letting C3 = 2 (^ — pb) , we have 



\supp('x.(l))\ + \supp{'x.(m))\—2 \supp{'x.{l)) fl s-upp(x(r7i))|. 
, equals C2V^ + C2V^ — 2cl = 2c2 (^n — C2). Also, since all 



15^(1' 



E„(i) 



-C3 T, 



(1)1 



Z 



Pr(|Tm|)e-»'+ VPr(|T«|)e-«' 



Therefore, 



|t5^J|<(1-5)m(i) 



p(i) < y^ L-l-^vw +e-'3(^~'^)''''^ 

m=2 



.{1)1 



Noting that jj,m equals 2c2 {y/n — C2), we note that for any r < ^ j^^*^ min {^(5^, 03(1 — 5)} , (the 
right hand side of which converges to 2c2(loge)(min |^(5^, 03(1 — 5)})), implies that Pi goes to zero 
superpolynomially quickly in n (more specifically, as 2"^*^^^). 

Similarly, the probability of error given T = is 

P(°) = PrgJ3^,dH(0,y)>dH(x(m),y)] 
= Pr^J3?,wt^(z) > d^(x(m),z)] 

Denote Em = {wtj:/(y) > d//(x(m), y)} to be the error event when Alice does not communicate with 
Bob. We also denote tL = supp{5l{m)) . Then, we have 
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^ E E Pr(x(m; 

m=l xe{0,l}" 

Note that since p = -%, /iL , defined as E ( tL ), equals C2\fn. We use /i*^") to denote ^m ■ Hence 



A P^z, [Et^) 



Hence, 



Ex(i) Pr ^1 



.(0) 



Ex(i) 



E„(o) 

J- TT), 



E„(o) 



< E (0) 

J- TTT, 



Prg ( ratio of I's of Z^ in xf^ > 






(0) 



Prg ratio of I's in Z5 in T^"' > 






E 



T,^ 



(0) 






P6-'(l -P6^ 



U(o)L ^. 



Thus, 



,(0) 



E^(o) 



-C3 X 



(0)1 



< g-|<5V°)+e-C3(l-<5)M(°>. 






+ e 



-C3(l-<5)/i(0) 



m=l 



< 2''^ e^s-^V*"^ +e-c3(i-<5) 



-^V'"' 



Noting that [I'm equals C2\fn, we note that for any 



r < — — mm <^ -b , 03(1 - b) 

'n 2 



(6) 



the right hand side converges to C2(loge)(min {^5^, 03(1 — 5)}), which implies that Pi goes to zero 
superpolynomially quickly in n (more specifically, as 2^^*^^^). 

Now recall that the overall probability of error equals 2Pe + Pi . Hence, for random codes, Bob's 
overall probability of error decays to zero as 2~^'^^\ But this probability also includes the randomness 
over choice of codebook C. Hence by Markov's inequality this means that at least a 1 — 2~^*^^) fraction 
of all possible codes are (1 — e) -deniable. 
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C. Deniability 

We now prove that a random code C also has overwhelming probability of being highly deniable. This 
requires substantially more work than the proof of reliability in the previous subsection, which broadly 
followed somewhat standard achievability techniques. 

For notational convenience, for each y G {0, 1}" let Pr^(y) denote the probability (over the random 
variable Z^^,) that Willie observes y given that Alice's transmission status is (she transmits nothing). 
Analogously, let Pr"(y) denote the probability (over the random variables M,Z^) that Willie observes 
y given that Alice's transmission status is 1 (to reiterate - this probability is calculated as an average 
over both the random variables M and Z^, but is for a specific code C, and hence is not necessarily 
"smooth".) Finally, let E(Pr"(y)) denote the "smoothed" version of Pr"(y), specifically, averaged over 
all codebooks generated according to the probability distribution specified in the achievability scheme. 

Recall that a code C is (1 — e) -deniable if for every estimator Estc{.) of Willie, 

a{Estc{.))+l3{Estc{.))>l-e. (7) 

But by standard statistical arguments (reprised in [[2l as Fact 1), (|7]) is implied by the condition that 

V(Pr:;,Pr:)<e. (8) 

Here use V (Pri, Pr2) to denote the variational distance between any two probability distributions Pri(y) 
and Pr2(y), i.e., V(Pri,Pr2) is defined as 



\\ E |Pri(y)-Pr2(y)|) 

\y6{o,i}" / 



2 
But by the triangle inequality, 

V(Pr::„Pr^) < V(Pr;^,Ec(Pr:))+V(Ec(Pr:),Pr:). (9) 

Also, we note that Pr^(y) corresponds to the n-letter distribution induced by n Bernoulli- (pi„) random 
variables. Similarly, the "smoothed" distribution Ec(Pr") corresponds to the n-letter distribution induced 
by n Bernoulli- (p^ * p) random variables. 

By further standard statistical arguments (reprised in [|2l as Facts 2 and 3), we have 



v(pc,pr:) < \i\v{vvi\\vv:). 



-P(Pr^||Pr, 



Hence, using the Taylor series bound on the KuUback-Leibler divergence shown in Claim [2} and recalling 
that p is set to equal C2/y/n, we have that 

V(PC,Pr;) < c(p„)py| < c(p„)^, (10) 



where c{p.) = ^/U^ - '-i=^ 
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Up to this point, the proof is similar to the proof in [2J. The remainder of this paper focuses on the 
challenging task of bounding the second term in ([9]). Clearly this second term need not be small for 
specific codes C - a "bad" codebook will not behave like its expectation!^ Slightly more precisely, we 
aim to show that with high probabihty over C, V(Ec(Pr"), Pr") is "small". 

We now proceed as follows. For each of notation, since for the remainder of this proof we are considering 
just the deniability of the code, we drop the subscript w from y, with the understanding that y refers to 
the codeword observed by Willie. 

We first define the typical type classes of Y as 

T{ujy}{Y) = {y ■■ wtH(y) =Wyn e {n{p,^ * p - Ay) , n{p^ * p + Ay))} . 

These are just the sets of vectors y that are Ay-strongly typical with respect to the distribution (p^, l—p^,). 
Then we can divide the term V(Ec(Pr^),Pr") from ^ into the two terms 



V(Ec(Pr:),Pr:) = 


= ^>J|Pr.(y)-EcPr.(y) 

y 


= 


= I E |Pr.(y)-EcPr.(y)| + ^ Yl 




yeTi^y)iY) nT{^.y){'y) 



|Pr,(y)-EcPr,(y)|. (11) 

Here both terms correspond to the difference between the distribution on y observed by Willie due to the 
actual code C used, and the distribution on y if the ensemble distribution (over random codebooks) had 
been used. The first terms deals with the difference between these distributions for "typical" y, and the 
second one for "atypical" y - bounding these two terms require different techniques, as outlined later. 

We also (for each typical y), and any pair of integers {d, Wx) define the type classes ofX. with respect 
to y as 

T(d,^^)(X|y) = {x : dH(x,y) = rfn, wt^lx) = w^n}. 

, In words, a set of this form corresponds to x vectors which have a specific Hamming weight vOxU, and 
a specific Hamming distance dn from y- 

We're specifically interested in the typical type classes of X with respect to y , defined as 

7{d,u,4(X|y) = {x : d/^(x,y) = dn e {n{p^-Ay\x,n{p^+Ay\x)),wtH{^) = WxU G {np{l-Ax),np{l+Ax))}- 

In words, these correspond to type classes ofx with respect to y that are strongly typical (with "slack" Ax 
and Ay\x) with respect to a given y, and given that X are generated with i.i.d. Bernoulli(p) components, 
and that the channel from Alice to Willie is a BSC(p^). 

Given these definitions, we note that the term Prs(y) can be written in terms of channel parameters, 

'^For instance, if the codebook has only very high weight codewords, Theeorem 111 already indicates that such a code cannot be highly 
deniable. The point is, of course, that the probability (over the ensemble of random codebooks) of such a code instantiating is small. 
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and the type classes defined above, as follows: 



xec 



Yl Pzjy\^)pi^)+ Yl Pzjy\^)pi^) 

jcer{{d,w^}){x\y)nc Jc^r{{d,w^}){X\y)nC 



(12) 



Also, the first term in RHS of equation (12) 



Yl Pz„(yl^)p(x) 



xeTr 



{d,mx} 



{X|y)nc 



E E 

^^ >{d,Wx)y 

^ V Ir 

2r^fil Z^ I (' 



p(y|x)p(x) 



(d.-x)(x|y)nc 

(<i,«-.)(X|y)nC p(y|x) 



dn,Wxn 



Note that p(y|x) is constant for a particular type-class. Hence the value depends only on the number of 
codewords within a particular type-class - this number of course depends on the particular code C. To 
be able to use concentration inequalities to prove that the actual number of codewords in each "typical" 
type class we consider is close to its expectation, we first need to bound from below the expected size of 
each typical type class, and show that this size is super-linear (which would allow for tight concentration 
of measure). We note that for a given typical y, the expected number of codewords in a particular typical 
type-class with respect to that y is simply the size of the code times the probability of that type-class. 
That is. 

For y G rK^4(X|y), Ec ( 7^d,..)(X|y) n C ) = Pre (x G 7^,,^,)(X|y)) \C\. (13) 



But, by using standard counting arguments on the number of elements in specific type-classes, we have 
that 



Prc(XG7^,,.,)(X|y)) 



> 



dun ) \ dion J 

)(doi+dii)ng( ^^^''|i^^J ^(dio+doo)ng( rf^P^°rfQj ^(diO+dii)n/]^ _ ^\{doo+doi)n 



(n + iy 

1 

{n + iy 



2nH{XT\YT)2-n{V{XT\\X)+H(XT)) 



Here da is defined as 



(n + 1) 

\{k:Xk=i,yk=j}\ 



_2-niI{Xr;Yr)+ViXr\\X)) 



Hj 



for i,j G {0, 1}, that is, they denote the empirical fractions of pairs 
of symbols from (x, y)- The inequality above follows from standard bounds on combinatorial numbers 
via Stirling's approximation. 



Hence bounding Pr^ (X G 7(d,^^)(X|y) j from below in (13) is equivalent to bounding from above 
/ (Xr; Yr) +V{Xr\\X) over the "typical" types. 

Using the bounds on KL-divergences and entropy functions obtained via Taylor series expansions and 
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summarized in claims in the Appendix, we note that 



max nXr.Yr) 



max H (IV) - H (IVl^r) 






(14) 
(15) 



1 



H{p^* p + Ay) - H {p^- Ay\x) 

{Pn,*P + Ay) log — — + {Ipu, * p- Ay) log - 

P^ * p + Ay 1 - p^ * p - Ay 

-(p^ - Ay|x) log ^— — - (1 - p^ + Ay|x) log -z^—^ ^1^ (^6) 



Ay log 



Pw - Ay|x 
Ay log 



Pw + Ay| 



X 



p^ * p + Ay 1 - p^ * p - Ay 

+Ay|x log Ay|x log — 

P,„-Ay|Y l-p^ + Ay|x 



Pw - 1\y\X 



HPw * p) log 

-Pw log 

A log 



p^ * p + Ay 



+ (l-p^*p)log 



l-Pw* P + Ay 



- (1 -p^)log 



Pw - Ay|x 1 - p^ + Ay|x 

1 -Pt„*p- A)(l -p^ + A) 



(p^„*p + A)(p^- A) 
+2) (p^ * pWpw * p + A) + H{pw* p) -V {pu,\\Pw - A) - H {p,, 
I - Pw * p - A){1 - Pw + A) 



(17) 



(18) 



= A log 

+ 



(Pw* p + A){pw- A) 
A2 / 1 1 \ A2 / 1 



2 \pui*p l-pw*pj 2 \pw l-Pv 

1 -Pw* P 



+ 0(A3) 



+2) (pwWPw * p)+ p(1 - 2p^) log 



Pw* P 



(19) 



A log 



;i - p,„ * p - A)(i - p^ + A) 

(p«;*P + A)(p^- A) 
1 - Pt« * P 



+p(l - 2p^) log ^ """^ + 0(A2) + 0(p2), 
Pt« *p 



(20) 



Here, equality (16) follows by expanding the entropy terms. We further expand equation ((T6|) by 



collecting the A terms to obtain ( fTTj ). Choosing A = Ay|x = Ay, we have equation ( |T8] ). Equality 
in ([19]) holds by using Claim |2] in the Appendix. By Claim |3] in the Appendix, we obtain ( [20] ). 
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Also, 



maxP(XrllX) = I?(p(l + Ax)||p) 

J-X\Y 



-H (p(l + Ax)) + H{p)+ pAx log 



1-p 
P 



< -P'^'x 



-p^A' 



X 



P 1 -P 

1 1 

+ 



^ , l-p(l + Ax) ^ , 1-p 
pAx log ,;\ ^ , ^ + pAx log '^ 



-p^A' 



X 



P 1-P 

1 1 

+ 



P 1 - P 



+ pAx log 
+ pAx log 



p(l + Ax) 
(l-p)(l + Ax) 
l-p(l + Ax) 
1 - p + Ax - pAx 



P 



1 - p - pA 



X 



< pAx log 1 



A 



X 



l-pil + Ax] 



< pA 



A 



X 



X 



l-p{l + Ax] 

2 
X 



pA] 



l-p(l + Ax) 
0{pA\). 



(21) 

(22) 

(23) 

(24) 

(25) 

(26) 

(27) 

(28) 
(29) 



Equation pT] ) holds since the distribution maximizing the KL-divergence in the typical type-classes 
corresponds to the extremal one. Equation ( [22] ) holds by expanding the KL divergence term. Using Claim [2] 
and Claim |4| we obtain equation ([23]). By the fact that log(l + x) < a;, we obtain equation ( [TTj ). 

Now, note that by Chernoff s bound, 

wt//(y) 



Pr 



n 



Pw * P 



>P^*P + Ay\ < 2e-^^^", 



Pr 



d/f(x,y) 



n 



Pu 



> p^ + Ay|x < 2e 



'2At,-n 



Using A = Ay = Ay|x 



In A 
t2 



Pr 



2n 

wtj^(y) 



^(th)' behave 



n 



Pw* P 



Pr 



d//(x,y) 



n 



Pn 



> P«, * P + Ay J < £2, 
> P^ + Ay|x ) < 62. 



Similarly, letting Ax 



3 In 



np 



3 In 



C2\/n 



^, we have 



Pr (|wtj:^(x) - np\ > nAxp) < 26"^^^"^ 

= £3 
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/In^ 

So, setting Ci = \l -^, we have 



max / (X7-; Yj-) 



~ Vn ^^ {p^ * p + ^){p^ - ^J ^n ■^"' °^ p^*p \n 



/ o , _2_ 

Similarly, letting C5 = y ^ '^ , we have maxX> (xrUX) = O (^). 
Let 

so 

Ec(|7^.,.,)(X|y) n C|) > ^^2-^«^+^(i)2^^. 

(n + 1)^ 

Therefore 

Ec(|7^,,^,)(X|y) n C|) > 2^^^ for r > ce, (30) 

for some C7 > 0. Hence, we have 



Pr 



|7^d,«,.)(X|y) n C| - Ec(|7^d,«,.)(X|y) n C|)| > e4Ec(|7^,,^,)(X|y) n C|)] < 2exp (-2e22'^^v^ 



This means, |7(d,^^)(X|y) fl C| is highly concentrated around Ec(|7(d,«,^)(X|y) fl C\) for eac/z "typical" 
y. Note that 

Ec(Pr,(y)) = J]Pr(C) J] |,gjy|x)p(x) 



5^Pr(C) 5^ |7^,,^,)(X|y)nC|p(y|x)p(x) 
Y, $]Pr(C)|7^,,^,)(X|y) n Cb(y|x)p(x) 

dn,WxTi C 

J2 E(|7^,,^^)(X|y)nC7|)p(y|x)p(x). 



dn,Wxn 
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Therefore, for all y G T{w }(Y) and x G 7{dn,w^n}(X|y), with probability super-exponentially close to 1, 

E I E P(y|x)p(x)-5^Pr(C) 5^ P(y|x)p(x: 



y6r{»j^}(Y) xer{d.„^}(x|y) 



xGr{d,„^}(X|y) 



= E I E E P(y|x)p(x)-5]Pr(C) 5^ 5^ P(y|x)p(x) 

= E I E |7i.,..)(X|y)|p(y|xMx)- 5^ E(|7^,,^^)(X|y)|)p(y|x)p(x) 

y£Ti^ -i(Y) dn,Wa:n dn,Wxn 

^ E E |l'7^<i,«..)(X|y)|-E(|7^,,^^)(X|y)|)|p(y|x)p(x) 

< E E e4E(|7^.,.^)(X|y)|)p(y|x)p(x) 

= ^4 E Ep^(^) E p(yi^)p(^) 

ye7-{™;;}(Y) ^ xer{d.„,}(x|y) 

< 64 



(31) 



It remains for us to show that for y ^ T{w }(Y) or x ^ T{dn,w^n}P^\y), the contribution to the variational 
distance is small. Note that, 

^X]|Pr,(y)-EcPr,(y)| 
y 

= ^ E I E PzJy|x)p(x))-EP^(C') Yl PzJyIxMx)) 



ye7{»^}(Y) xer{d,„,}(X|y) 



xer{d,u,^}(X|y) 



+ ^ E I E PzJy|x)p(x))-5]Pr(C) Yl PzJyIxMx)) 
yer{»j^}(Y) s^r{d,„,}(x|y) c s^r{d,»,}(x|y) 

+ 1 E |Prs(y)-EcPr,(y)| 



y^r{»^}(Y) 



Note that. 



E I E P(y|xMx))-5]Pr(C) Yl p(yl^M^)) 

ye7-{^.y}(Y) x^r{d.„,}(X|y) C x^r{d,„^}(X|y) 

+ Y |Prs(y)-EcPr,(y)| 

y^r{„^}(Y) 

< E E p(y|xMx))+ Y Ep^(c') y p(y\^)p(^)) 

ye7-{^.y}(Y)x^r{d.„,}{x|y) yer{„^}(Y) c x^r{d,„^}(x|y) 

+ Y P^^(y)+ E EcPr.(y)| 
y^r{„^}(Y) y^r{»j^}(Y) 



< Pr ( Y ^ Tl^yyC^) or X ^ r{rf„,^,„}i 



XI Y) +E, 



Pr(Y ^ r{^,}(Y) or X ^ r|d„,^,„}(X|Y)) 
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Also, 



< Ec 
= Ec 



Pr(Y^ri^x(Y)orX^r/ 



'{^y} 



{dn^WxTi} 



(X|Y)) 



Pr(Y ^ r{.,}(Y))J +Ec [Pr(X i r|,„,.,.}(X|Y)|Y) 
Pr(Y ^ r|.,}(Y)) + Pr(X ^ r|,„,.,.}(X|Y)|Y) 



< t2 + t2 + t-i 

= 2e2 + e3 



And by inequality (31) and (32), we have 



262 + es + €4 

> Ec [Pr(Y ^ r{^,}(Y) or X ^ r{d„,^,„}(X|Y)) 
+ 5Z I Yl P(y|x)p(x)-^Pr(C) ^ p(y|x)p(x; 

> Yl |EcPrs(y)- 5^ P(y|x)p(x)| 

yer{„,y}(Y) xer{dn,™^„}(X|y) 

> J] EcPr,(y)- Yl E P(y|x)p(x) 
yer{„y}(Y) yer{„y}(Y) ieT{dn,w^n}{A\y) 

> l-ea- ^ ^ p(y|x)p(x) 

yer{„,j^}(Y) xer{d„,tu^n}(X|y) 

> 1 - £2 - Pr(Y e r{^,}(Y) and X G r{rf„,^,„}(X|Y)) 



1-62 



1 - Pr(Y ^ r{^,}(Y) or X ^ r{rfn,»^n}(X|Y)) 



Therefore, 



Pr(Y ^ r{.,}(Y) or X ^ r|,„,.,„|(X|Y)) - e^ 



Pr(Y ^ r{»,}(Y) or X ^ r{d„,„,„}(X|Y)) < 3e2 + £3 + 64 



(32) 



(33) 



Hence by combining (31 ), (32) and (33 ), with probability super-exponentially close to 1, the variational 

distance V(Ec(Pr"),Pr^) satisfies 



^ J2 \PT^siy) - EcPr,(y)| < ^ [e4 + 2e2 + £3 + 3e2 + ^3 + ^ = ^ [5e2 + 2e3 + 2e4 



(34) 



Substituting (34) and (10) into (|9]) and using (|7]) gives us that the deniability is at least 



a{Estc{.)) + l3{Estc{.)) >^-\ [5^2 + ^e-, + 264] + c(p^)^. 

Recall that V(Pr;^,EcPr:) < '-^ = ei, with p = ^, and c^ = ^. 
By ^ and (|30]), we have 



C2logemin<{ i^^^'^ ^'^i.i^, ~^^)^(^ -5)\ >'r> c^, 



(35) 



(36) 



34 



where cg = log ^^ (2c4 + 02(1 - 2p^)). 



Note that 



log 



l-Pw , 1 + 2a 

= log 

Pw I -2a 

= log(l + 2a) - log(l - 2a) 

= ; — \2a + + Oia^) + \2a+ + + Oia*) 

ln2 V 2 3 ^ 7 ln2 V 2 3 ^ ' 

= hr2 ^^+-r + ^^^' 



1 
W2 



2(1 - 2p^) + ^^^ ,^^"'^' + 0((1 - 2pJ^) 



Therefore, we need that 

272ei 



-!- + 

Pw 1-P' 



= \oge(^--p^ {1-5) > ^[2{l-2p^) + 0{{l-2p, 



[I - 2pu 
To satisfy the above inequality, assuming 



2C4 + 



v^ei 



Pw l~Pw 



Pw > g, 



we need 



(37) 



V2 
So, we need that 



^ \oge{l-2pb)\l-6) > -^(l-2pj2/^2./— + -^J21n- + y2ei) 

In 2 \ \ Pw l-Pw\l £2 / 



'' '^ " -- - ^ 2a-2p.)^(2-|y^+v/2.,) 



^loge(l-2p,f(l-5) > — ( 

2v^, 



l_^.^;i-2p„r^3^21u-+e, 



Thus, 



We obtain, 



(l-Pb?> 



4(l-2p^)2(Ay^+l 



1 - 2pfe > 2 



\ 



21n^ + l 

1-5 



;i - 2p, 



In^+l 



./(Mil 

V l-<5 



Let Cg = c8(ei,e2) = 2 

Now, let ci = e2 = 63 = €4 = ^e, so ( [35] ) becomes 



1 _ l-2pb 



s . Finally, p^ > 2 - 2,^ 



a + /3>l-e=l- — ei. 
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Hence Cg = \/ ^" i_/ — - 



33 



Also, due to (|36|) any relative throughput that satisfies log ^-^ ( 2 In ^ H ,'^f^^ ^ ) < r 



Vm 1— Piu , 

< I'^x 1 ^ gives us the desired probability of decay (notice the lower bound on the relative 

throughput arises due to the need to ensure that each typical type class has a super-linear number of 
codewords in expectation, so as to allow for sufficiently high probability of concentration). 

VIII. Proof of Theorem [4] 

To show Theorem |4[ we first show that for a constant composition code with the weight of the codewords 
equal to A;, the deniability satisfies a + /3 < hlah. And then, for an arbitrary code, we show that the 
deniability satisfies a + /3 < hlah by using union bound on the constant composition codes. Throughout 
the proof, Willie uses the following estimator, 

• Set T = 1, if wtjj(y^) < (1 + b{y{tH{^y)Vw^^Hi^ for some x; 

• Set T = 0, otherwise. 

Here, b is the function of vv^tj|/(x). 

First, for a constant composition code with the weight of the codewords equals to k, the probability of 
false detection is, 

a = Pr(wt5j(y^) < (1 + 6{k))pwWtH{'x.) for some x) 

< iVfc(C)Pr(wt^(y^) < (1 + 6{k))p^k) (38) 

< A^fc(C)e"(^"P-(^+'^('=))-P™)'^ (39) 



Here, ([38]) holds by union bound. p9\ holds by Chemoff bound. 
Similarly, if x is transmitted, 

/3(x) = Pr(wt^(yJ>(l + ,5(A;))p^wtH(x)) 

Hence, /3 = Ex/5(x)p(x) < e^s^^'^^'P'-^ Therefore, a + /3 < iVfc(C)e-(i-P»(i+^(^'»-P-)''= + e-5^(^)'p-'= 
for constant composition code with the weight of codewords equals to k. 

Now, consider an arbitrary code C. Let /j be the fraction of codewords in C with weight equal to i. 
So, the deniability is bounded from above as follows, 

a + P = ^(a + /3)i/i 

i 
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If the code is to 1 — e deniable, we need the following to hold, 

n 

1 - e < a + /3 < ^ rAr.(C)e-(i-P»(^+''(*))~P»)'^ + e-s'^^^^'^^^ fi 

i=0 

■ 
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Appendix 






^4^1 



Claim 1. The Taylor Series expansion o/log(l + x)=x — ^^ + ^ + 0{x'^ 
Proof. By examining the Taylor Series expansion of the logarithmic function. 
Claim 2. 



V{p\\p + x) <^(- + -^ ]+0{x' 

2 \p i-p; 



Proof 



{p\ p + x) 




p 1 — p 

= plog +(1 p)log 

p + x I - p- X 




. f A . .. f X \ 




\ Pj V 1-p/ 




\p 2p^ J \ I-P 


x^ 


2(1 -p)2 


2 \p {l-p)J 





+ 0{x 



Claim 3. 



V{p\\p^x) < ^ ^ ^^ - + ]+Oix' 

2 \p l-p' 
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Proof. Note that p * x = p + x{l — 2p), and Claim [2J 

Claim 4. 

1 — v — X 

H (p + x) — H (p) = V (p\\p + x) + X log 

p + X 

Proof. 

H{p + x) -H (p) 
= —{p + x) \og{p + x) — {\ — p — x) log(l — p — x) + pXogp + (1 — p) log(l — p) 

1 P , n M ^^P , 1 1-p-a; 

= piog 1-(1— pjlog hxlog 

p + X 1 — p — X p + X 

1 — p — X 



V {pWp + x) + xlog 



p + X 



Claim 5. 



Claim 6. 



Proof. 



T T) ^ IT 

H {p* x) — H {p) = V {p\\p * x) + x{l — 2p) log ■ 



V {p + x\\p) = —H {p + x) + H {p) + X log 



p* X 

1 — p 
p 



V{p + x\\p) 

, \, P + a; ,_, ,, \-p- X 

= [p + X) log \- [l — p — X) log 

p 1 — p 

= —H {p + x) + {p + x) log — \- {1 — p — x) \o, 



p 1 — X 

1 — p 

—H (p + x) + H(p)+x log . 

p 



Claim 7 (Chernoff bound: Type I(Additive)). Let Xi, X2, . . . , Xm be i.i.d. random variables with Pr(Xj = 
1) = p. Let e > 0, then 

Pr I — VXi>p + e I <e2^'"^ (40) 

\^U ) 

Claim 8 (Chernoff bound: Type II(Multiplicative)). Let Xi,X2, . . . ,Xm be i.i.d. random variables with 
Pr(Xi = l)=p. Let X = ^£7=1 ^i' «"^ /" = H^), then 

Pr (X > (1 + (5)/i) < e^ (41) 

for < 6 <1. 



